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ON-LINE DIFFERENCE MAXIMIZATION* 

MING- YANG KAOt AND STEPHEN R. TATE* 

Abstract. In this paper we examine problems motivated by on-line financial problems and 
stochastic games. In particular, we consider a sequence of entirely arbitrary distinct values arriving 
in random order, and must devise strategies for selecting low values followed by high values in such 
a way as to maximize the expected gain in rank from low values to high values. 

First, we consider a scenario in which only one low value and one high value may be selected. 
We give an optimal on-line algorithm for this scenario, and analyze it to show that, surprisingly, 
the expected gain is n — O(l), and so differs from the best possible off-line gain by only a constant 
additive term (which is, in fact, fairly small — at most 15). 

In a second scenario, we allow multiple nonoverlapping low/high selections, where the total gain 
for our algorithm is the sum of the individual pair gains. We also give an optimal on-line algorithm 
for this problem, where the expected gain is n 2 /8 — O(nlogn). An analysis shows that the optimal 
expected off-line gain is n 2 /6 + 0(1), so the performance of our on-line algorithm is within a factor 
of 3/4 of the best off-line strategy. 

Key words, analysis of algorithms, on-line algorithms, financial games, secretary problem 

AMS subject classifications. 68Q20, 68Q25 

PII. S0895480196307445 

1. Introduction. In this paper, we examine the problem of accepting values 
from an on-line source and selecting values in such a way as to maximize the difference 
in the ranks of the selected values. The input values can be arbitrary distinct real 
numbers, and thus we cannot determine with certainty the actual ranks of any input 
values until we see all of them. Since we only care about their ranks, an equivalent way 
of defining the input is as a sequence of n integers x\, x%, . . . , x„, where 1 < Xi < i for 
alH 6 {1, . . . , n}, and input Xi denotes the rank of the zth input item among the first i 
items. These ranks uniquely define an ordering of all n inputs, which can be specified 
with a sequence of ranks r\ , r^, , ■ ■ . , r n , where these ranks form a permutation of the 
set {1,2,..., n,}. We refer to the ranks as final ranks, since they represent the rank 
of each item among the final set of n inputs. We assume that the inputs come from 
a probabilistic source such that all permutations of n final ranks are equally likely. 

The original motivation for this problem came from considering on-line financial 
problems ||, [|, (t[ |J , where maximizing the difference between selected items natu- 
rally corresponds to maximizing the difference between the buying and selling prices 
of an investment. While wc use generic terminology in order to generalize the setting 
(for example, we make a "low selection" rather than pick a "buying price"), many of 
the problems examined in this paper are easily understood using notions from invest- 
ing. This paper is a first step in applying on-line algorithmic techniques to realistic 
on-line investment problems. 

While the original motivation comes from financial problems, the current input 
model has little to do with realistic financial markets, and is selected for its mathe- 
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matical cleanness and its relation to fundamental problems in stochastic games. The 
main difference between our model and more realistic financial problems is that in 
usual stock trading, optimizating rank-related quantities is not always correlated to 
optimizing profits in the dollar amount. However, there are some strong similarities 
as well, such as exotic financial derivatives based on quantities similar to ranks |p0[ . 

The current formulation is closely related to an important mathematical prob- 
lem known as the secretary problem Jul IfJ, which has become a standard textbook 
exampl e [[3|, [5 , [Tg|] , and has been the basis for many interesting extensions (includ- 
ing (l], |14|, |l 5 , p7[ [HI). The secretary problem comes from the following scenario: 
A set of candidates for a single secretarial position are presented in random order. 
The interviewer sees the candidates one at a time, and must make a decision to hire 
or not to hire immediately upon seeing each candidate. Once a candidate is passed 
over, the interviewer may not go back and hire that candidate. The general goal is to 
maximize either the probability of selecting the top candidate, or the expected rank of 
the selected candidate. This problem has also been stated with the slightly different 
story of a princess selecting a suitor j|, p. 110]. More will be made of the relationship 
between our current problem and the secretary problem in and for further reading 
on the secretary problem, we refer the reader to the survey by Freeman JTo[ ] - 

As mentioned above, we assume that the input comes from a random source in 
which all permutations of final ranks 1,2, ... ,n are equally likely. Thus, each rank 
Xi is uniformly distributed over the set {1, 2, . . . , z}, and all ranks are independent of 
one another. In fact, this closely parallels the most popular algorithm for generating 
a random permutation (ll| p. 139]. A natural question to ask is, knowing the relative 
rankxi of the current input, what is the expected final rank of this item (i.e., E[ri\xi])7 
Due to the uniform nature of the input source, the final rank of the ith item simply 
scales up with the number of items left in the input sequence, and so E[ri 
(a simple proof of this is given in Appendix A). 

Since all input ranks independent and uniformly distributed, little can 

be inferred about the future inputs. We consider games in which a player watches 
the stream of inputs, and can select items as they are seen; however, if an item is 
passed up then it is gone for good and may not be selected later. We are interested 
in strategies for two such games: 

• Single pair selection: In this game, the player should make two selections, the 
first being the low selection and the second being the high selection. The goal 
of the player is to maximize the difference between the final ranks of these 
two selections. If the player picks the low selection upon seeing input xg at 
time step £, and picks the high selection as input x^ at time step h, then the 
profit given to the player at the end of the game is the difference in final ranks 
of these items: r^ — re. 

• Multiple pair selection: In this game, the player makes multiple choices of 
low/high pairs. At the end of the game the difference in final ranks of each 
selected pair of items is taken, and the differences for all pairs are added up 
to produce the player's final profit. 

The strategies for these games share a common difficulty: If the player waits too long 
to make the low selection, he risks not having enough choices for a good high selection; 
however, making the low selection too early may result in an item selected before any 
truly low items have been seen. The player in the second game can afford to be less 
selective. If one chosen pair does not give a large difference, there may still be many 
other pairs that are good enough to make up for this pair's small difference. 
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We present optimal solutions to both of the games. For the first game, where 
the player makes a single low selection and a single high selection, our strategy has 
expected profit n — 0(1). From the derivation of our strategy, it will be clear that the 
strategy is optimal. Even with full knowledge of the final ranks of all input items, the 
best expected profit in this game is less than n, and so in standard terms of on-line 
performance measurement |]r2| , fl6|| , the competitive raticjj] of our strategy is one. The 
strength of our on-line strategy is rather intriguing. 

For the second game, where multiple low/high pairs are selected, we provide an 
optimal strategy with expected profit |n 2 — 0(n log n). For this problem, the optimal 
off-line strategy has expected profit of approximately ^n 2 , and so the competitive 
ratio of our strategy is |. 

2. Single Low/High Selection. This section considers a scenario in which the 
player may pick a single item as the low selection, and a single later item as the high 
selection. If the low selection is made at time step i and the high selection is made 
at time step h, then the expected profit is E[rh — rg\. The player's goal is to use a 
strategy for picking I and h in order to maximize this expected profit. 

As mentioned in the previous section, this problem is closely related to the sec- 
retary problem. A great deal of work has been done on the secretary problem and 
its variations, and this problem has taken a fundamental role in the study of games 
against a stochastic opponent. Our work extends the secretary problem, and gives 
complete solutions to two natural variants that have not previously appeared in the 
literature. 

Much insight can be gained by looking at the optimal solution to the secretary 
problem, so we first sketch that solution below (using terminology from our problem 
about a "high selection"). To maximize the expected rank of a single high selection, 
we define the optimal strategy recursively using the following two functions: 

Ti n (i): This is a limit such that the player selects the current item if 
Xi > H n (i). 

R n (i): This is the expected final rank of the high selection if the optimal 
strategy is followed starting at the ith time step. 

Since all permutations of the final ranks are equally likely, if the iih input item 
has rank Xi among the first i data items, then its expected final rank is ^rif Xi. Thus, 
an optimal strategy for the secretary problem is to select the zth input item if and 
only if its expected final rank is better than could be obtained by passing over this 
item and using the optimal strategy from step i + 1 on. In other words, select the 
item at time step i < n if and only if 

n + 1 . , 

——x l > R n (i + 1). 
i + 1 

If we have not made a selection before the nth step, then we must select the last item, 
whose rank is uniformly distributed over the range of integers from 1 to n — and so 
the expected final rank in that case is R n (n) = For i < n we can also define 



Hn(i) 



-jRn{i 



1 "Competitive ratio" usually refers to the worst-case ratio of on-line to off-line cost; however, 
in our case inputs are entirely probabilistic, so our "competitive ratio" refers to expected on-line to 
expected off-line cost — a worst-case measure doesn't even make sense here. 
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and to force selection at the last time step define H n (n) = 0. Furthermore, given this 
definition for TL n (i) , the optimal strategy at step i depends only on the rank of the 
current item (which is uniformly distributed over the range 1, . . . , i) and the optimal 
strategy at time i + This allows us to recursively define R n (i) as follows when i < n: 

„ , \ 'Hn(i) — 1 „ , n \ 1 n + 1 

Rnii) = ^ 1 Rn(l+l)+ ~ ' _ ~TT' ? 

I *■ — ' % I + 1 

J=Wn(») 

K(*)-1 D ,. , n , n + 1 (* + 7A, (*))(* -WnW + i) 

% i(i + 1) 2 

7L(i) - 1 / , i n+1 , A n+1 

Since TL n {n) — and i?„(n) = we have a full recursive specification of both the 
optimal strategy and the performance of the optimal strategy. The performance of 
the optimal strategy, taken from the beginning, is i?„(l). This value can be computed 
by the recursive equations, and was proved by Chow et al. to tend to n + 1 — c, for 
c w 3.8695, as n — > oo ||. Furthermore, the performance approaches this limit from 
above, so for all n we have performance greater than n — 2.87. 

For single pair selection, once a low selection is made we want to maximize the 
expected final rank of the high selection. If we made the low selection at step i, 
then we can optimally make the high selection by following the above strategy for the 
secretary problem, which results in an expected high selection rank of R n (i + 1). How 
do we make the low selection? We can do this optimally by extending the recursive 
definitions given above with two new functions: 

C n (i): This is a limit such that the player selects the current item if 
Xi < £„(i). 

P n (i): This is the expected high- low difference if the optimal strategy for 
making the low and high selections is followed starting at step i. 

Thus, if we choose the ith input as the low selection, the expected profit is R n (i + 
1) — jjjXi. We should select this item if that expected profit is no less than the 
expected profit if we skip this item. This leads to the definition of C n {i): 



if i = n 

if i < n 



±±L( J R„(i + l)-P n (i + l)) 

Using £ n (i), we derive the following profit function: 

{0 if i = n , 

Pn(i + 1) + ^ (Rn(i + 1) - Pn(i + 1) - ?±f • if t < n . 

From the derivation, it is clear that this is the optimal strategy, and can be imple- 
mented by using the recursive formulas to compute the C n (i) values. The expected 
profit of our algorithm is given by P„(l), which is bounded in the following theorem. 

Theorem 2.1. Our on-line algorithm for single low/high selection is optimal and 
has expected profit n — 0(1). 

Proof. It suffices to prove that a certain inferior algorithm has expected profit n — 
0(1). The inferior algorithm is as follows: Use the solution to the secretary problem 
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to select, from the first \n/2\ input items, an item with the minimum expected final 
rank. Similarly, pick an item with maximum expected rank from the second |"n/2] 
inputs. For simplicity, we initially assume that n is even; see comments at the end of 
the proof for odd n. Let I be the time step in which the low selection is made, and 
h the time step in which the high selection is made. Using the bounds from Chow et 
al. ||, we can bound the expected profit of this inferior algorithm by 

n + i n + 1 

E[r h - n ] = E[r h ] - E[n] > ——(n/2 + 1 - c) - — — -c 

n/2 + 1 n/2 + 1 

n + 1 4c 
= (n + 2 - 4c) = n + 1 - 4c H . 



Chow et al. (6) show that c < 3.87, and so the expected profit of the inferior algorithm 
is at least n — 14.48. For odd n, the derivation is almost identical, with only a change 
in the least significant term; specifically, the expected profit of the inferior algorithm 
for odd n is n + 1 — 4c + , which again is at least n — 14.48. □ 

3. Multiple Low/High Selection. This section considers a scenario in which 
the player again selects a low item followed by a high item, but may repeat this 
process as often as desired. If the player makes k low and high selections at time 
steps li, li, . . . , Ik and hi, h-i, . . . , hk, respectively, then we require that 

1 < l x < hi < £ 2 < h 2 < ■ ■ ■ < 4 < h k < n. 

The expected profit resulting from these selections is 

E[r hl - r lt ] + E[r h2 - r fa ] H h E[r hk - r tk }. 

3.1. Off-line Analysis. Let interval j refer to the time period between the 
instant of input item j arriving and the instant of input item j + 1 arriving. For a 
particular sequence of low and high selections, we call interval j active if ti < j < hi 
for some index i. We then amortize the total profit of a particular algorithm B by 
defining the amortized profit Ab{j) for interval j to be 

. , J Tj+i — rj if interval j is active, 
Ab{3) - j q otherwise. 

Note that for a fixed sequence of low/high selections, the sum of all amortized profits 
is exactly the total profit, i.e., 

n hi—1 h,2 — l h k — l 

Ab ^ = £ - r i) + £ ( r J+! - + • • • + £ ^ - ^) 



= Ohi ~rt x ) + (rh 2 -rg 2 ) + --- + (r hh - r th ). 

For an off-line algorithm to maximize the total profit we need to maximize the 
amortized profit, which is done for a particular sequence of r^'s by making interval 
j active if and only if rj + ± > rj. Translating this back to the original problem of 
making low and high selections, this is equivalent to identifying all maximal-length 
increasing intervals and selecting the beginning and ending points of these intervals 
as low and high selections, respectively. These observations and some analysis give 
the following lemma. 
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Lemma 3.1. The optimal off-line algorithm just described has expected profit 
H»' 1). 

Proof. This analysis is performed by examining the expected amortized profits 
for individual intervals. In particular, for any interval j, 

E[A OFF {j)} = Pr[r j+i > r s ] ■ E[A 3 \r 3+1 > rj ] + Pr[r J+1 < rj ] ■ E[A 3 \r 3+1 < r 3 ] 



1 , 1 

- • E[r j+1 - r 3 \r 3+1 > rj] + - ■ 



4e 



1 Pr[r 3+ i = k and r 3 



i=l k=i+l L J + 1 Jl 

1 n 1 " 2 

2^ E n (n-l) (A;_i) 

i=i fc=i+i v ' 

1 2 (n+l)n(n-l) 

2 ' n(n - 1) 6 
n + 1 
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Since there are n — 1 intervals and the above analysis is independent of the interval 
number j, summing the amortized profit over all intervals gives the expected profit 
stated in the lemma. □ 

3.2. On-line Analysis. In our on-line algorithm for multiple pair selection, 
there are two possible states: free and holding. In the free state, we choose the 
current item as a low selection if Xi < t 2 > furthermore, if we select an item then we 
move from the free state into the holding state. On the other hand, in the holding 
state if the current item has Xi > then we choose this item as a high selection 
and move into the free state. We name this algorithm OP, which can stand for 
"opportunistic" since this algorithm makes a low selection whenever the probability 
is greater than ^ that the next input item will be greater than this one. Later we will 
see that the name OP could just as well stand for "optimal," since this algorithm is 
indeed optimal. 

The following lemma gives the expected profit of this algorithm. In the proof of 
this lemma we use the following equality: 

i— 1 

Lemma 3.2. The expected profit from our on-line algorithm is 
n + 1 



E\P, 



OP\ 



H n-2 — 2H n -i if n is even, 



H n-i — 2H n H — I if n is odd. 

2 n 



In cleaner forms we have E[Pop] = ^jf^(n — Hn + 0(1)) = \ n2 ~ O(nlogn). 

Proof. Let Ri be the random variable of the final rank of the ith input item. Let 



Aop(i) be the amortized cost for interval i as defined in 63.1. Since Appii) is nonzero 
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only when interval i is active, 

E{Aop{i)} = E[Aop{i) | Interval i is active] • Pro6[Intcrval i is active] 

= E[Ri+i — Ri (Interval i is active] • Pro6[Interval i is active]. 

Therefore, 

n-1 

E[P p] = Y,E[Aop(i)] 

i=l 
n-1 

= S[i2j + i — R4 [Interval i is active] • Prob [Interval i is active]. 

i=l 

Under what conditions is an interval active? If < this interval is certainly 
active. If the algorithm was not in the holding state prior to this step, it would be 
after seeing input x%. Similarly, if Xi > the algorithm must be in the free state 
during this interval, and so the interval is not active. Finally, if Xi = the state 
remains what it has been for interval i — 1. Furthermore, since i must be odd for this 
case to be possible, i — 1 is even, and Xi-i cannot be | (and thus Xi-i unambiguously 
indicates whether interval i is active). In summary, determining whether interval i is 
active requires looking at only Xi and occasionally Xi-\. Since the expected amortized 
profit of step i depends on whether i is odd or even, we break the analysis up into 
these two cases below. 

Case 1: i is even. Note that Prob[xi < ^fci] = |, and Xi cannot be exactly 1 4jr, 
which means that with probability 5 interval i is active. Furthermore, J?j+i 
is independent of whether interval i is active or not, and so 

E{Aop(i) | Interval i is active] = E[R i+ i] 

_ n+1 

_ n+1 

~ 2 i+l i 8 

n+1 i 
~ 4 ' i + l 

Case 2: i is odd. Since interval 1 cannot be active, we assume that i > 3. We need 
to consider the case in which Xi = ^jp, and so 

Prob [Interval i is active] 

i+l i+l i 

= Prob[xi < —^—} + Prob[xi = —^~] ' Prob[x l _ 1 < -] 

_ i- 1 1 1 _ 1 
~ 2i + ~i ' 2 ~ 2' 

Computing the expected amortized cost of interval i is slightly more complex 
than in Case 1. 

E[Aop(i)\lnterval i is active] 
= E[R i+ i) — E[Ri | Interval i is active] 



— E[Ri | Interval i is active] 
n|1^2. 

i + i 

n+1 2 i(i + 2) 
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n 


+ 1 


n 


+ 1 




2 


i - 


f 1 


n 


+ 1 


n 


+ 1 




2 


i - 


f 1 


n 


+ 1 


n 


+ 1 




2 


i - 


f 1 


n 


+ 1 


i — 


1 




4 


i 





V /2 2 1 *+l 

E + r — 



4/ 



Combining both cases, 



n— l 



U[Pop] = £J[Aop(i) (Interval i is active] • Pro6[Interval i is active] 



8 = 1 



2A; + 1 ^ 2fc + l 

k—l k=l 

where the first sum accounts for the odd terms of the original sum, and the second 
sum accounts for the even terms. 

When n is even this sum becomes 

T7 i i /L(n-2)/-j .),. -'"~'[ -- •>;. 



8 

n + 1 



L(n-2)/2j 

E 

fc=i 


2fc 


L(n-1)/2J 

h E 


2fc + 1 


(n-2)/2 

2 E 

fc=l 


2k \ 




2k + 1 J 





8 \ fH 2fc + l f-f 2fc + l 
n + 1 



s 



n + P-^2 - 2i7„_ 



which agrees with the claim in the lemma. When n is odd the sum can be simplified 
as 

E l P orl = —[ E 2fcTT + E 2* + ! 

y fe=i fe=i 

n + 1 / 2k n — 1 

= 8 1 ^ 2k + 1 j 

y fc=i 

8 y 2 ? i 

which again agrees with the claim in the lemma. The simplified forms follow the fact 
that for any odd n > 3 we can bound — < P n — ff n-i < In 2 + — . □ 

Combining this result with that of §3^, we see that our on-line algorithm has 
expected profit 3/4 of what could be obtained with full knowledge of the future. In 
terms of competitive analysis, our algorithm has competitive ratio 4/3, which means 
that not knowing the future is not terribly harmful in this problem! 
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3.3. Optimality of Our On-Line Algorithm. This section proves that algo- 
rithm OP is optimal. We will denote permutations by a small Greek letter with a 
subscript giving the size of the permutation; in other words, a permutation on the set 
{1,2,..., i} may be denoted pi or <n. 

A permutation on i items describes fully the first i inputs to our problem, and 
given such a permutation we can also compute the permutation described by the first 
i — 1 inputs (or i — 2, etc.). We will use the notation to denote such a restriction. 

This is not just a restriction of the domain of the permutation to {1, ... ,i — 1}, since 
unless <Ti(i) — i this simplistic restriction will not form a valid permutation. 

Upon seeing the ith input, an algorithm may make one of the following moves: it 
may make this input a low selection; it may make this input a high selection; or it may 
simply ignore the input and wait for the next input. Therefore, any algorithm can 
be entirely described by a function which maps permutations (representing inputs 
of arbitrary length) into this set of moves. We denote such a move function for 
algorithm B by M B , which for any permutation cr, maps M B (cri) to an element of the 
set {"low", "high", "wait"}. Notice that not all move functions give valid algorithms. 
For example, it is possible to define a move function that makes two low selections in 
a row for certain inputs, even though this is not allowed by our problem. 

We define a generic holding state just as we did for our algorithm. An algorithm 
is in the holding state at time i if it has made a low selection, but has not yet made a 
corresponding high selection. For algorithm B we define the set L B (i) to be the set of 
permutations on i items that result in the algorithm being in the HOLDING state after 
processing these i inputs. We explicitly define these sets using the move function: 



f {<Ti\M B (<Ti) = "low"} if i = 1, 



L B (i) = < 



{a t \M B {a t ) = "low" or 

(M B (ai) = "wait" and a^-x e L B (i - 1))} if i > 1. 



The L B (i) sets are all we need to compute the expected amortized profit for interval 
i, since 

E[A B (i)] = Pro6[Intcrval i is active] • E[R i+ i — i?j|Interval i is active] 
\L B (i)\ I n + 1 n+l 




We use the above notation and observations to prove the optimality of algorithm OP. 

Theorem 3.3. Algorithm OP is an optimal algorithm for the multiple pair se- 
lection problem. 

Proof. Since the move functions (which define specific algorithms) work on permu- 
tations, we will fix an ordering of permutations in order to compare strategies. We or- 
der permutations first by their size, and then by a lexicographic ordering of the actual 
permutations. When comparing two different algorithms B and C, we start enumer- 
ating permutations in this order and count how many permutations cause the same 
move in B and C, stopping at the first permutation Cj for which M B (<7i) ^ Mc{<Ji), 
i.e., the first permutation for which the algorithms make different moves. We call the 
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number of permutations that produce identical moves in this comparison process the 
length of agreement between B and C . 

To prove the optimality of our algorithm by contradiction, we assume that it is 
not optimal, and of all the optimal algorithms let B be the algorithm with the longest 
possible length of agreement with our algorithm OP. Let <j k be the first permutation 
in which M B (a k ) ^ Mop{pk)- Since B is different from OP at this point, at least 
one of the following cases must hold: 

(a) Cfe|fe-i L B (k — 1) and <7fc(fc) < and M B {<Jk) ^ "low" (i.e., algorithm 
B is not in the holding state, gets a low rank input, but does not make it a low 
selection). 

(b) <7fc|fc_i L B (k — 1) and Ukik) > and M B (<r k ) ^ "wait" (i.e., algorithm 
B is not in the holding state, gets a high rank input, but makes it a low selection 
anyway) . 

(c) <7fc|fc_i G L B (k - 1) and o- k (k) > ^±1 an d M B (a k ) ^ "high" (i.e., algorithm 
B is in the HOLDING state, gets a high rank input, but doesn't make it a high selection). 

(d) <7k\k-i e L B (k- 1) and ak{k) < and M B (<r k ) ^ "wait" (i.e., algorithm 
B is in the holding state, gets a low rank input, but makes it a high selection 
anyway) . 

In each case, we will show how to transform algorithm B into a new algorithm C 
such that C performs at least as well as B, and the length of agreement between C 
and OP is longer than that between B and OP. This provides the contradiction that 
we need. 

Case (a): Algorithm C's move function is identical to _B's except for the following 
values: 

McK) = "low", 

{"high" if p k+ i\ k = <r k and M B (a k+1 ) = "wait" , 

"wait" if p k+ i\ k = a k and M B (er fe+1 ) = "low" , 

Msipk+i) otherwise. 

In other words, algorithm C is the same as algorithm B except that we 
"correct B's error" of not having made this input a low selection. The changes 
of the moves on input k + 1 insures that Lc(k + 1) is the same as L B {k + 1). 
It is easily verified that the new sets Lc(i) (corresponding to the HOLDING 
state) are identical to the sets L B {i) for all i ^ k. The only difference at k is 
the insertion of a k , i.e., Lc(k) — L B (k) U {<r k }. 

Let P B and Pc be the profits of B and C, respectively. Since their amortized 
costs differ only at interval fc, 

E[P C - Pb] 

= E[A c (k)] - E[A B (k)} 

\ p k eL c (k) J 

\ p k eL B (k) ) 

= ir(r kTi ak{k) ) • 
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By one of the conditions of Case (a), <r k (k) < ^±1 , so we finish this derivation 
by noting that 

n n + 1 (1 1 „A n+l (1 1 fc + l\ 

Therefore, the expected profit of algorithm C is greater than that of B. 
Case (b): As in Case (a) we select a move function for algorithm C that causes 
only one change in the sets of HOLDING states, having algorithm C not make 
input k a low selection. In particular, these sets are identical with those of 
algorithm B with the one exception that Lc{k) = Ls{k) — {cfe}- Analysis 
similar to Case (a) shows 



^-^^{kh^-l) 



> n±i(j_ .^±i_r, =0 . 

- M \k + l 2 2 



Case (c): In this case we select a move function for algorithm C such that Lc{k) = 
Ls{k) — {cfc}, resulting in algorithm C selecting input k as a high selection, 
and giving an expected profit gain of 

P[D Dl n + l ( 1 l\ n + l ( 1 k+1 1\ 

E[Pc - Pb] = — Urr fe(fc) - 2j > -M- (—1 • — - 2) = °- 

Case (d): In this case we select a move function for algorithm C such that Lc(k) = 
Lsik) U {o-fe}, resulting in algorithm C not taking input k as a high selection, 
and giving an expected profit gain of 

n n + l (1 1 , A n+l (1 1 fc + l\ 

E ^- PB ^^{-2-WT^ k) )^^{2-kTT-^)^ - 

In each case, we transformed algorithm B into a new algorithm C that performs 
at least as well (and hence must be optimal), and has a longer length of agreement 
with algorithm OP than B does. This directly contradicts our selection of B as the op- 
timal algorithm with the longest length of agreement with OP, and this contradiction 
finishes the proof that algorithm OP is optimal. □ 

4. Conclusion. In this paper, we examined a natural on-line problem related 
to both financial games and the classic secretary problem. We select low and high 
values from a randomly ordered set of values presented in an on-line fashion, with 
the goal of maximizing the difference in final ranks of such low/high pairs. We con- 
sidered two variations of this problem. The first allowed us to choose only a single 
low value followed by a single high value from a sequence of n values, while the 
second allowed selection of arbitrarily many low/high pairs. We presented provably 
optimal algorithms for both variants, gave tight analyses of the performance of these 
algorithms, and analyzed how well the on-line performance compares to the optimal 
off-line performance. 

Our paper opens up many problems. Two particularly interesting directions are 
to consider more realistic input sources and to maximize quantities other than the 
difference in rank. 

Appendix. Proof of Expected Final Rank. In this appendix section, we 
prove that if an item has relative rank Xi among the first i inputs, then its expected 
rank n among all n inputs is given by -E[rj|:rj] = jjjXi. 
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Lemma A.l. If a given item has rank x from among the first i inputs, and if the 
i + 1st input is uniformly distributed over all possible rankings, then the expected rank 
of the given item among the first i + 1 inputs is j^rjX. 

Proof. If we let R be a random variable denoting the rank of our given item from 
among the first i + 1 inputs, then we see that the value of R depends on the rank of 
the i + 1st input. In particular, if the rank of the i + 1st input is < x (which happens 
with probability yrr), then the new rank of our given item will be x + 1. On the 
other hand, if the rank of the i + 1st input is > x (which happens with probability 
z+ i+i X )> then the rank of our given item is still x among the first i + 1 inputs. Using 
this observation, we see that 

nrnl x i + 1 - x x + l + i + l-x i + 2 

E[R] = — {x + 1) H ; — x = : x = - — -x, 

L 1 i + l y ' z + l z + 1 i + 1 

which is what is claimed in the lemma. □ 

For a fixed position i, the above extension of rank to position i + 1 is a constant 
times the rank of the item among the first i inputs. Because of this, we can simply 
extend this lemma to the case where x is not a fixed rank but is a random variable, 
and we know the expected rank among the first i items. 

COROLLARY A. 2. If a given item has expected rank x from among the first i 
inputs, and if the i + lst input is uniformly distributed over all possible rankings, then 
the expected rank of the given item among the first i + 1 inputs is j^rjX. 

Simply multiplying together the change in expected rank from among i inputs, 
to among i + 1 inputs, to among i + 2 inputs, and so on up to n inputs, we get 
a telescoping product with cancellations between successive terms, resulting in the 
following corollary. 

COROLLARY A. 3. // a given item has rank x from among the first i inputs, and 
if the remaining inputs are uniformly distributed over all possible rankings, then the 
expected rank of the given item among all n inputs is ^jrf DE- 
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